A Novel and Ubiquitous Marine Methylophage Provides Insights into Viral-Host Coevolution and Possible Host-Range Expansion in Streamlined Marine Heterotrophic Bacteria

ABSTRACT The methylotrophic OM43 clade are Gammaproteobacteria that comprise some of the smallest free-living cells known and have highly streamlined genomes. OM43 represents an important microbial link between marine primary production and remineralization of carbon back to the atmosphere. Bacteriophages shape microbial communities and are major drivers of mortality and global marine biogeochemistry. Recent cultivation efforts have brought the first viruses infecting members of the OM43 clade into culture. Here, we characterize a novel myophage infecting OM43 called Melnitz. Melnitz was isolated independently from water samples from a subtropical ocean gyre (Sargasso Sea) and temperate coastal (Western English Channel) systems. Metagenomic recruitment from global ocean viromes confirmed that Melnitz is globally ubiquitous, congruent with patterns of host abundance. Bacteria with streamlined genomes such as OM43 and the globally dominant SAR11 clade use riboswitches as an efficient method to regulate metabolism. Melnitz encodes a two-piece tmRNA (ssrA), controlled by a glutamine riboswitch, providing evidence that riboswitch use also occurs for regulation during phage infection of streamlined heterotrophs. Virally encoded tRNAs and ssrA found in Melnitz were phylogenetically more closely related to those found within the alphaproteobacterial SAR11 clade and their associated myophages than those within their gammaproteobacterial hosts. This suggests the possibility of an ancestral host transition event between SAR11 and OM43. Melnitz and a related myophage that infects SAR11 were unable to infect hosts of the SAR11 and OM43, respectively, suggesting host transition rather than a broadening of host range. IMPORTANCE Isolation and cultivation of viruses are the foundations on which the mechanistic understanding of virus-host interactions and parameterization of bioinformatic tools for viral ecology are based. This study isolated and characterized the first myophage known to infect the OM43 clade, expanding our knowledge of this understudied group of microbes. The nearly identical genomes of four strains of Melnitz isolated from different marine provinces and the global abundance estimations from metagenomic data suggest that this viral population is globally ubiquitous. Genome analysis revealed several unusual features in Melnitz and related genomes recovered from viromes, such as a curli operon and virally encoded tmRNA controlled by a glutamine riboswitch, neither of which are found in the host. Further phylogenetic analysis of shared genes indicates that this group of viruses infecting the gammaproteobacterial OM43 shares a recent common ancestor with viruses infecting the abundant alphaproteobacterial SAR11 clade. Host ranges are affected by compatible cell surface receptors, successful circumvention of superinfection exclusion systems, and the presence of required accessory proteins, which typically limits phages to singular narrow groups of closely related bacterial hosts. This study provides intriguing evidence that for streamlined heterotrophic bacteria, virus-host transitioning may not be necessarily restricted to phylogenetically related hosts but is a function of shared physical and biochemical properties of the cell.

IMPORTANCE Isolation and cultivation of viruses are the foundations on which the mechanistic understanding of virus-host interactions and parameterization of bioinformatic tools for viral ecology are based. This study isolated and characterized the first myophage known to infect the OM43 clade, expanding our knowledge of this understudied group of microbes. The nearly identical genomes of four strains of Melnitz isolated from different marine provinces and the global abundance estimations from metagenomic data suggest that this viral population is globally ubiquitous. Genome analysis revealed several unusual features in Melnitz and related genomes recovered from viromes, such as a curli operon and virally encoded tmRNA controlled by a glutamine riboswitch, neither of which are found in the host. Further phylogenetic analysis of shared genes indicates that this group of viruses infecting the gammaproteobacterial OM43 shares a recent common ancestor with viruses infecting the abundant alphaproteobacterial SAR11 clade. Host ranges are affected by compatible cell surface receptors, successful circumvention of superinfection exclusion systems, and the presence of required accessory proteins, which typically limits phages to singular narrow groups of closely related bacterial hosts. This study provides intriguing evidence that for streamlined heterotrophic bacteria, virus-host transitioning may not be necessarily techniques. The genomic similarity between independent isolates suggests a cosmopolitan global distribution, which was supported by metagenomic read recruitment from global ocean viromes. Genome analysis of Melnitz revealed a two-piece tmRNA gene (ssrA), controlled by a glutamine riboswitch. Riboswitch control of regulation is a feature of streamlined organisms such as OM43 and SAR11 (27), and we show here that it is also a feature of their associated viruses. Like previously reported SAR11 myophages (15,16), Melnitz also encoded the pore proteins of a curli operon that are absent in the host. Structural analysis suggests a putative reconfiguration may allow this phage-encoded protein to serve as a novel gated secretin or pinholin, with gene synteny indicating a role in timing the release of viral progeny. Phylogenetic analysis revealed that both the ssrA gene and tRNA genes encoded by Melnitz were more closely related to those found within the alphaproteobacterial SAR11 host or its associated viruses, than those of its own gammaproteobacterial host. These findings point toward a recent shared ancestor indicative of host transitioning between OM43 and SAR11.

RESULTS AND DISCUSSION
Phage Melnitz infecting Methylophilales sp. H5P1 shares viral gene clusters with Pelagibacter phages. Two bacteriophages were isolated on the OM43 strain H5P1 from two environmental water samples taken from the Western English Channel (WEC) previously (16). Two more phages were obtained from an additional water sample taken at the Bermuda Atlantic Time Series (BATS) station in the Sargasso Sea (Table 1). Phages were successfully purified, sequenced from axenic cultures and assembled into single circular contigs. Comparison to publicly available phage genomes with CheckV (28) suggested that the viral contigs were complete, circularly permuted genomes without terminal repeats. Transmission electron microscopy (TEM) showed straight contractile tails indicative of myophage morphology (see Fig. S1 in the supplemental material). Capsid size is not reported due to the presence of sample preparation artifacts, which may affect accuracy of capsid dimension measurements (29).
All four phage genomes shared 99.95 to 100% average nucleotide identity across their full genomes therefore all four phages should be regarded as the same viral species (30). We named this species "Melnitz" after a character in the popular Ghostbusters franchise, continuing the theme of another previously isolated phage on OM43 (Venkman) (16). For clarity, where individual phages within the population (Melnitz) are specified, they will subsequently be referred to with a numerical suffix (e.g., Melnitz-1). General features of the four phages are summarized in Table 1. Shared gene network analysis (VConTACT2 [31]) using assembled contigs from Global Ocean Viromes (GOV2 [13]) and RefSeq (V88 with ICTV and NCBI taxonomy [32]) was performed to evaluate Melnitz phylogeny. All four Melnitz variants were assigned to a subcluster with joint membership of two clusters: Cluster_2 and Cluster_3 (Fig. 1). Cluster_2 contained an additional 15 viral contigs from nine different GOV2 viromes. Cluster_3 contained three virome contigs shared with Cluster_2, as well as 25 pelagimyophage genomes assembled from metagenomes (PMP-MAVGs) (33). The subcluster containing Melnitz also contained Pelagibacter myophages HTVC008M and Mosig (17,34), suggesting they belong to the same family. A total of 18 contigs clustering with Melnitz isolates were identified from GOV2 and WEC viromes (13,35). Genome alignments (BLASTn) of all contigs sharing the VConTACT2 cluster with Melnitz are provided in Fig. S2 in the supplemental material. Further phylogenetic analysis showed that only two environmental contigs consistently shared a branch with Melnitz (L4_2016_09_28_HYBRID_000000000048 and Station137-DCM-ALL-assembly-NODE1-length-137507-cov-14.215202), based on single shared genes encoding: tail sheath protein (see Fig. S3), a terminase large subunit (see Fig. S4), and scaffolding proteins (see Fig. S5), as well as four concatenated structural genes (Fig. 2). All phages within this viral group were either isolates known to infect streamlined heterotrophs or were previously predicted to do so based on phylogenetic similarity (33).
Metagenomic analysis shows cosmopolitan nature of Melnitz-like phages. To establish the distribution patterns of Melnitz, reads of 131 GOV2 viromes were mapped  (38) were randomly subsampled to five million reads (without replacement) and mapped against phage genomes using the same thresholds. None of the known OM43 phage isolates were identified as present in the samples using a minimum cutoff 40% genome coverage (data not shown). Related SAR11 myoviruses (Mosig and HTVC008M) were also absent, with the exception of 462 RPKM recruited by HTVC008M in a single sample (HSD20-02a-277-S2C009-0200-170214). The low representation of isolated myoviruses for either SAR11 or OM43 in this region might indicate that unknown local viral strains (with at least 60% difference across the genomes) dominate, or that these viruses are rare in the NPSG at all depths. It is also worth noting that while GOV2 and WEC viromes were produced using iron chloride flocculation to concentrate viral particles, those from ALOHA were produced by capturing viral particles on 0.02-mm filters. Neighbor-joining maximum-likelihood tree (500 bootstraps) based on four individually aligned and concatenated structural genes (capsid assembly, major capsid, sheath subtilisin, terminase large subunit) of genomes and contigs that were clustered with OM43 phage Melnitz, with the exception of phages infecting Synechococcus spp. that were used to root the tree. All four Melnitz-like isolates were included but the branch was collapsed for clarity. Branch support values of 1 are not shown. Leaves without labels indicate contigs from the Global Ocean Virome (GOV2) data set (13), which were omitted for clarity. A fully labeled tree is available in Fig. S3 in the supplemental material.
Therefore, it is possible (although unlikely) that the absence of Melnitz-like genomes in the NPSG could result from methodological differences. It has been demonstrated that the marine biosphere maintains persistent bacterial "seed banks" (39), meaning there is a high probability that any given marine bacteria can be found in any marine ecosystem, albeit in extremely low abundance, awaiting favorable conditions for growth. Similarly, many viruses infecting globally distributed bacteria such as SAR11 are found in viromes from all oceans (19), suggesting the majority of viral species are shared between oceans (40). In the case of Melnitz, the isolation of the same virus with up to 100% nucleotide identity across the full genome on three separate occasions in the Western English Channel and BATS station in the Sargasso Sea (;5,000-km distance between sites) suggests either low population-level variance or that the isolation conditions used favor this strain. Nonetheless, the presence of cultivable Melnitz populations at both sites supports the virus seed-bank hypothesis where viral populations are conserved and persistent in the environment, being passively transported across oceans via global currents until favorable conditions select them for propagation (41,42). The "environmental selection" in the case of Melnitz would likely be the enrichment culturing used for isolation, providing enough suitable hosts and nutrients for viral replication. The possibility remains that the Melnitz population at the BATS site is maintained by a resident "seed" population of OM43, though OM43 is seldom reported in microbial communities at BATS.
Genomic characterization of Melnitz-like Methylophilales phages. Genomes of the four Melnitz-like myophage isolates were between 141,372 and 141,552 bp in length, all with a G1C content of 37.6% (Table 1), similar to that of their hosts (34.4%). For each, 224 open reading frames (ORFs) were predicted (using five different gene callers and manual curation [43]), except Melnitz-2, which had two additional small ORFs (NCBI accession numbers QZI94509.1 and QZI94511.1) of unknown function (226 ORFs total). Four additional tRNA sequences for all four genomes were identified (genes 59, 125, 147, and 153) (tRNAScan [44] v2.0, ARAGORN v1.2.38 [45]), and functional annotation (using BLASTp (46) phmmer v.241.1 against PFam [47] and Swiss-Prot [48]) of ORFs suggest that all Melnitz strains encoded the same set of genes. Out of 224 ORFs, 143 (;63%) had unknown function (Fig. 3A). The tail-assembly associated region in the Melnitz-4 genome between bp 99465 and 105981 had nine genes with altered length compared to the equivalents in the other three phages, but where annotation was possible, the genes were predicted to have the same function. Predicted protein structures (PHYRE2 [49]) had low confidence and did not allow for a meaningful structural comparison (data not shown). Though structural variations in tail and receptor genes are often considered to be important factors for defining strainlevel host ranges (50,51), all four Melnitz variants had identical host ranges when screened against a panel of other OM43 isolates from the WEC (Methylophilales sp. strains C6P1, D12P1, and H5P1) (15; data not shown). Melnitz possessed a set of structural genes typically associated with T4-type myophages (52), including T4-like baseplate, tail tubes, base plate wedges, tail fibers, virus neck, tail sheath stabilization, prohead core and capsid proteins. Melnitz encodes orthologs of the auxiliary metabolic genes (AMGs) mazG (gene 10) and phoH (gene 81), which are involved in cellular phosphate starvation induced stress responses and are a common feature of phages from P-limited marine environments (53)(54)(55). In addition, hsp20 (gene 223) was identified, which, together with mazG and phoH, is considered part of the core genes in T4-like cyanomyophages (53,56). Both Melnitz and its host, Methylophilales sp. H5P1, encode a type II DNA methylase to potentially protect DNA against cleavage by restriction endonucleases (57). In T-even phages, DAM methylase (similar to type II DNA methylase) protects phage DNA from restriction endonucleases through competitive inhibition (58); thus, DNA methylase in Melnitz (gene 5) could also be involved in protection from host restriction endonucleases during infection.
For DNA replication and metabolism, Melnitz encodes nrdA/nrdE (KEGG Orthology KO00525) and nrdB/nrdF (KEGG Orthology KO00526) genes (genes 39 and 41) that together form active ribonucleotide reductases for catalyzing the synthesis of deoxyribonucleoside triphosphates required for DNA synthesis (59). Additional DNA replication-and manipulation-associated genes were polB (DNA polymerase beta, gene 4), recA (DNA recombinase, gene 11), and dnaB (helicase-like, gene 14), as well as two DNA primases (gene 13 and 38) and a 2OG-Fe oxygenase (gene 12). The rpsU gene (gene 60, encoding the bacterial 30S ribosomal subunit 21S) was found downstream adjacent to gene 59-a tRNA-Arg (tct). Virally encoded 21S genes were previously identified in the Pelagibacter phages HTVC008M (17) and Mosig (34), as well as the OM43 phage Venkman (16). It is thought that the 21S subunit in phages might be required for initiating polypeptide synthesis and mediation of mRNA binding (60); the proximate tRNA sequence may support a translational and protein synthesis role of the viral ribosomal gene. Within the same region, Melnitz's Gene_23 was annotated as a putative surface layer-type (S-layer) protein, previously associated with a strong protection mechanism against superinfection in bacteriophages of Bacillus spp. and Pseudomonas aeruginosa (61)(62)(63). Though protection against superinfection is sometimes considered to be more important in temperate phages, the lytic Escherichia coli T4-like phage Spackle also possesses structural defense mechanisms against superinfection from closely related phages (64). Therefore, we postulate that phage-encoded S-layer proteins in Melnitz may be used to alter host cell surface receptors and thereby protect against superinfection from viral competitors.
Temporal regulation of T4-like phages primarily occurs at the transcriptional level and requires concomitant DNA replication and promoter/terminator sequences to organize into early, middle and late stages (65). In Melnitz, a total of 46 promoters (31 early and 15 late) and six terminator sequences were identified-considerably fewer than the .124 promoter sequences found in T4 (52,66). Like other marine T4-like phages, such as those infecting streamlined cyanobacteria (67), Melnitz lacked identifiable middle promoters. Reduced numbers of regulatory elements are a common feature of genomically streamlined bacteria such as SAR11 and OM43 (68). Therefore, we propose that temporal regulation of Melnitz is similarly simplified as a result of host streamlining. An alternative hypothesis is that additional (middle-) promoter sequences in Melnitz are too divergent from known sequences for detection.
Like other T4-like phages, Melnitz did not encode an RNA polymerase, relying on host transcription machinery after infection for synthesis of phage proteins. Melnitz possessed the essential T4-like transcriptional genes: s 70 -like (gene 215), gp45-like sliding clamp C terminal (gene 226), clamp loader A subunit (gene 91), transcription coactivator (gene 90), and a translational regulator protein (gene 1). In T4, the DNA sliding clamp subunit of a DNA polymerase holoenzyme coordinates genome replication in late-stage infection, forming an initiation complex with a s -factor protein and host RNA polymerase (69). To activate it, the sliding clamp is loaded by a clamp-loader DNA polymerase protein complex, allowing the complex to move along DNA strands (70,71). As the same genes were found in Melnitz it is likely using them to form a T4-like protein complex for transcription and genome replication.
Marine Melnitz-like phages may use glutamine riboswitches to regulate genome expression. Stalling of ribosomes can occur when mRNAs lack stop codons (nonstop mRNA), which may cause sequestration of ribosomes and production of defective polypeptides. ssrA encodes a tmRNA that in bacterial trans-translation, together with smpB and ribosomal protein S1, is important to release ribosomes that have stalled during protein biosynthesis. An additional role of tmRNA is to add a hydrophobic peptide tag onto nonstop mRNA, instigating proteolysis of the polypeptide (72). Permuted two-piece tmRNA are transcribed as a single precursor RNA but split into two RNA molecules (comprising a tRNA-like domain and an mRNA-like domain) (73). Two-piece tmRNA are found in most Alphaproteobacteria, and some groups of Gammaproteobacteria and Cyanobacteria (74), whereas other bacteria encode one-piece tmRNA. In Melnitz, the ssrA gene (gene 86) encoding a two-piece tmRNA was situated on the same operon and directly upstream of the transcription coactivator (Fig. 3B). Host-encoded ssrA can be used as sites of integration for prophages in deep-sea Shewanella isolates (62), and fragments of ssrA have previously been identified in prophage genomes (63), most likely as a result of imprecise prophage excision that transfers host genetic material to the excising phage. Though we did not find integrases or other evidence that Melnitz is able to integrate into host genomes, a possible gene exchange with a temperate prophage may explain the presence of a complete ssrA gene in Melnitz. A possible role for a complete viral tmRNA is to help maintain the hijacked bacterial machinery. Alternatively, the phage might use tmRNA to selectively tag and degrade host proteins to recycle amino acids for viral protein synthesis. Since the tmRNA is located immediately downstream of an early promoter, we postulate that this occurs early during infection to minimize degradation of viral proteins. We evaluated the frequency of ssrA genes found in 18,146 publicly available genomes in the phage genome database at millardlab.org (last updated on 21 January 2021), which includes all RefSeq genomes (75). We also evaluated whether those identified were either one-piece or two-piece tmRNAs. Only 402 phages (2.3% of all available genomes) encoded 133 unique tmRNA genetic structures. Of these, 53 were suggested to be two-piece tmRNA by ARAGORN (45), of which 50 were encoded by phages isolated from marine or aquatic samples (see Table S1). In contrast, only 2.5% of one-piece tmRNAs were marine, suggesting that two-piece tmRNA is a feature more prevalent in marine and aquatic phages compared to phages from other environments. Of the 47 contigs (27 of which were classed as complete by CheckV) that were clustered with Melnitz based on shared gene content (Fig. 1), 17 had permuted tmRNA sequences, indicating that ssrA encoded tmRNA is a common, but not defining feature of this viral group.
The 39 domain of the ssrA gene in Melnitz encodes a predicted glutamine riboswitch that resembles glnA RNA motifs found in cyanobacteria and other marine bacteria (76). Riboswitches are a common regulatory mechanism in streamlined marine bacteria due to their low metabolic maintenance cost compared to protein-encoded promoters and repressors (68). Like tmRNAs, riboswitches are a rare feature in phages. A phage-encoded riboswitch putatively controlling regulation of psbA was previously identified in a cyanophage (71). Of the 133 isolate phage genomes encoding tmRNAs only two phages other than Melnitz possessed a riboswitch (both glutamine): Pelagibacter phage Mosig (34) and Prochlorococcus phage AG-345-P14 (77). Additional riboswitches were identified in metagenomic assembled virome contigs. In 10 metagenomically assembled viral genomes of pelagimyophages thought to infect SAR11 (PMP-MAVGs) and five GOV2 contigs related to Melnitz, only one of these contigs had both tmRNA and a riboswitch; 11 PMP-MAVGs encoded neither. This suggests that (i) glutamine riboswitches are a common but not defining feature in Melnitz-like marine phages and that (ii) phages of that group use riboswitches or tmRNA, but rarely both, with the only two identified examples occurring in isolated phages, not metagenomically derived genomes. Curiously, the bacterial OM43 host of Melnitz (H5P1) does not encode a glutamine riboswitch; only one cobalamin riboswitch was found located upstream of the Vitamin B12 transporter gene btuB. Other members of the Methylophilaceae (OM43 strains HTCC2181, KB13, and MBRSH7, as well as Methylopumilus planktonicus, M. rimovensis, and M. turicensis) also lack glutamine riboswitches. In cyanobacteria, the glutamine riboswitches were previously found to regulate the glutamine synthase glnA and are strongly associated with nitrogen limitation (78). Phages infecting Synechococcus have been shown to use extracellular nitrogen for phage protein synthesis (79), which might indicate a similar role for phage-encoded glutamine riboswitches in Melnitz. However, neither OM43 nor Melnitz encode the glutamine synthase glnA or homologous genes required for glutamine synthesis. This suggests that Melnitz-like phages use viral riboswitches to regulate their own genes rather than hijacking the cellular machinery.
Phage-encoded curli operons may be involved in regulating cell lysis. Curli genes are typically associated with the bacterial production of amyloid fibers that are part of biofilm formation. The CsgGF pore spans the outer membrane as part of a type VIII secretion system, allowing for the secretion of CsgA and CsgB that assemble into extracellular amyloid fibers (80). In complete curli modules, CsgG and CsgF form an 18mer heterodimer comprising nine subunits with 1:1 stoichiometry between CsgF and CsgG. The structure of the pore is dictated by CsgG, which forms a channel ;12.9 Å in diameter. CsgF forms a secondary channel ;14.8 Å in diameter at the neck of the beta barrel ( Fig. 4A; see also Fig. S8A and B) and assists in excretion of the amyloid fiber. Like previously described pelagimyophages (33), Melnitz possesses csgF and csgG, but lacks the genes for amyloid fiber production: csgA and csgB (Fig. 3). Zaragoza-Solas et al. speculated that phage-encoded curli pores may allow for the uptake of macromolecules, or together with unidentified homologues of missing curli genes form a complete, functional curli operon producing amyloid fibers for "sibling capture" of proximate host cells (33). However, we saw no evidence of "clumping" as a result of sibling capture, neither in cytograms nor TEMs, under the culturing conditions used in this study. In addition, similar to results in pelagimyophages, we did not find evidence for a complete phage encoded curli biogenesis pathway in Melnitz, nor does the bacterial H5P1 host encode any curli-associated genes, suggesting that these genes were acquired by an ancestral strain of Melnitz and pelagimyophages before host transition to OM43 and SAR11, respectively. Since streamlined genomes would be expected to lose residual or otherwise unnecessary genes encoding incomplete machinery (27), it is unlikely that curli-associated phage encoded genes supplement a partial operon in the host. The presence of curli genes across multiple phage species suggests that their presence provides the virus with a competitive advantage that has yet to be identified.
Structural homology modeling of the Melnitz-encoded CsgGF pore with Swiss-Model (81) yielded low structural similarity to known CsgGF structures (GMQE , 0.5, with Q-MEAN scores ,-3 and z-scores )2 outside normalized Q-mean score distributions from a nonredundant set of structures from PDB). Structural similarity was greatest in regions 238 to 258 of CsgG where Q-MEAN scores were ;0.7, with conserved regions in the periplasmic-facing a-helices of CsgG. Melnitz-encoded CsgF showed poor structural similarity to any models available within Swiss-Model. Therefore, we used AlphaFold2 (82) to predict the structure of phage-encoded CsgG and CsgF de novo and compared predicted structures to known CsgGF structures. The predicted structure of Melnitz-encoded CsgG showed close structural similarity to known CsgG, with two notable exceptions: (i) a narrowing of the barrel at the neck and (ii) an extension of the a-helix outside the barrel (Fig. 4B and C). In contrast there was little structural similarity between Melnitz-encoded CsgF and the structure of CsgF in E. coli, barring a shared a-helix domain. Phage-encoded CsgF comprised two long a-helices ending in a b-sheet (Fig. 4D). Similar modeling of CsgG and CsgF encoded by pelagiphage HTVC008M also contained these features (see Fig. S9A and B, respectively). Assuming the pore orientation in an infected OM43 cell is the same as that of CsgG in E. coli, the additional domains on phage-encoded CsgF would either place the terminal b-sheet inside the barrel of CsgG (unlikely due to steric hindrance) or the extended structure would point outwards into the extracellular milieu, in an unusual conformation with unknown function. It is more likely that in this configuration of CsgG, phageencoded CsgG and CsgF have diverged and evolved independently. Therefore, they may no longer form a heterodimer, with CsgG retaining the function as a pore, and CsgF evolving independently to provide an alternative, unknown function. Indeed, the narrowing of the barrel neck in Melnitz-encoded CsgG creates a channel 14.7 Å in diameter, matching that provided by CsgF in E. coli (see Fig. S8B and C). One possibility is that phage-encoded CsgG is functionally analogous to pinholins, used by phage l to regulate cell lysis. Pinholins form channels ;15 Å in diameter in the inner membrane, resulting in rapid membrane depolarization and subsequent activation of membranebound lysins (83).
An alternative hypothesis is that in Melnitz, CsgGF forms a heterodimer in the outer membrane similar to CsgGF in E. coli, but that the structure is inverted so that the extended a-helices of CsgF point into the periplasm. In this conformation, the extension from Melnitz-encoded CsgG (Fig. 4C) might interact to stabilize the elongated hinged a-helices of CsgF. The C-terminal b-sheets of CsgF could then form a second channel beneath the CsgG barrel (see Fig. S10A). In this conformation, CsgGF is structurally similar to a secretin, a large protein superfamily used for macromolecule transport across the outer membrane such as DNA for natural competence (e.g., PilQ within the type IV secretion system in Vibrio cholerae) or extrusion of filamentous phages during chronic infection (e.g., pIV in bacteriophages Ff) (84). Like CsgGF in an inverted conformation, PilQ comprises a b-barrel, with a secondary pore below, attached by two hinged a-helices. The inner surface of PilQ is negatively charged to repel the negatively charged backbone of DNA and assist in transportation across the membrane. In contrast, the inner surface of the CsgGF channel is predicted to be positively charged and narrower (14.7 Å compared to 20.6 Å in PilQ), making a role in DNA transport across the membrane unlikely. While the function of phage-encoded CsgGF is not yet clear, the additional domains of CsgF and the lack of other curli genes within either the phage or the host suggest that it is a novel pore structure whose function has diverged from ancestral CsgGF and would be a worthy target for future structural resolution. Using cryo-electron tomography could resolve the cell surface and curli protein structure or provide evidence of extracellular amyloid fibers and/or curli pores.
Whether the CsgGF complex in Melnitz acts as a pinholin or a secretin, gene synteny supports a putative role in regulation of the timing of cell lysis. First, genes csgGF are located immediately downstream of thymidylate synthase thyX and ssrA and transcription coactivator genes (Fig. 3). Overexpression of thymidylate synthase in the D29 phage infecting Mycobacterium tuberculosis results in delayed lysis and higher phage yields (85). In phage T4, postponing lysis is used to delay the release of viral progeny until conditions are favorable, thereby maximizing virion production before viral release and successful replication after (86). In the T4-like Melnitz, thyX could therefore be involved in postponing lysis as well. Lytic control may putatively be related to the upstream glutamine riboswitch. Host H5P1 lacks glutamine synthesis pathways but has a complete peptidoglycan biosynthesis pathway necessary to produce peptidoglycans from glutamine. Therefore, both the H5P1 host and Melnitz phage are restricted to using glutamine for protein and peptidoglycan synthesis only. In phage l, the depletion of peptidoglycan precursors can trigger lysis through activation of a spanin complex (87). We speculate that during the course of the infection cycle, glutamine levels are kept low through incorporation into viral proteins. When protein synthesis is complete, intracellular glutamine levels increase, activating the glutamine riboswitch. This in turn could trigger an opening of the curli pore, which may result in rapid depolarization of the membrane, triggering activation of lysins and rapid cell lysis.
Structural homology modeling (Swiss-Model) of a Melnitz-encoded enzyme (gp67), putatively annotated as a glycosyl hydrolase, revealed structural similarity (38% sequence identity at 98% coverage, with a global model quality estimate [GMQE] of 0.82) to autolysin SagA encoded by Brucella abortus (PDB model 7DNP). SagA acts to generate localized gaps in the peptidoglycan layer for assembly of type IV secretion systems (88). Therefore, it is likely that gp67 and CsgGF work in concert in Melnitz. Peptides involved in binding peptidoglycan at the active site (Glu17, Asp26, Thr31) were conserved between Melnitz gp67 and SagA (see Fig. S11) (89). Outside this active site, structural similarity to T4 endolysin (PDB model 2561.1) was low (13% identity over 51% coverage; GMQE = 0.18). Melnitz lacked any other lysin-like genes. Like the T4-encoded endolysin e (90), gp67 is under the control of an early transcription promoter (Fig. 3B). We therefore propose that the gp67 gene in Melnitz potentially serves two functions: (i) as an endolysin for degradation of cellular peptidoglycans during lysis or (ii) as an autolysin to enable assembly of the CsgGF or CsgG-only pore protein. Phages infecting Streptococcus pneumoniae upregulate hostencoded autolysins alongside phage-encoded lysins during late-stage infection to accelerate cell lysis (91). The close structural match between gp67 and SagA suggests that an ancestor of Melnitz acquired a host-encoded autolysin as an alternative lysin during its evolutionary history. Whether this autolysin-derived phage lysin could be activated by membrane depolarization through CsgG is unknown.
Melnitz-encoded genes suggest a possible host transition event from SAR11 to OM43. Phage host ranges are largely determined by interactions between host receptor proteins and phage structural proteins such as tail fibers that enable the phage to adsorb and inject its genetic material. Mutation in phage proteins, either through point mutations or recombination during coinfection can result in host range expansion or transition (92). Host range expansion or transition within species boundaries is more common, but rare transition events between hosts from different genera can occur through mutations in tail fibers (93). Here, two separate lines of evidence converge to suggest that Melnitz underwent a recent host transition from SAR11 to OM43. First, ssrA encoded by Melnitz was identified to be more closely related to homologs within the alphaproteobacterial lineage. Two-piece circularly permuted tmRNA is common in three major lineages: Alphaproteobacteria, Betaproteobacteria (now part of Gammaproteobacteria [21]), and Cyanobacteria (73,74). The phylogenetic evidence for alphaproteobacterial ssrA in Melnitz rather than ssrA that matches the gammaproteobacterial lineage of its OM43 host suggests that this gene was acquired from an Alphaproteobacteria (Fig. 5). Second, two of the four tRNAs encoded by Melnitz were more closely related to tRNAs encoded by pelagimyophages than those of their host. Melnitz encodes two versions of tRNA-Arg(TCT): one most closely related to that of its host H5P1 and one closely related to Pelagibacter phage Mosig. Phage-encoded tRNAs enable large phages to sustain translation as the host machinery is degraded to fuel phage synthesis (94). Phage-encoded tRNAs also enable phages to optimize protein synthesis in a host with different codon usage and thus serve as both a marker of increased host range and evolution through different hosts (95). Indeed, tRNAs are often used for computational host prediction of phages due to high sequence conservation of the gene between host and viral forms (96,97). We postulated that the host range of Melnitz would be evident in the four tRNAs found within its genome and reflect potential hosts in the OM43 clade. Alternatively, if a recent host transition occurred, as ssrA phylogeny suggests, tRNAs would be similar to those found in the SAR11 bacteria and associated viruses. A search for tRNA genes in the genomes of isolated Pelagibacter phages and PMP-MAVGs (47 genomes) and isolated phages infecting OM43 (three genomes) identified tRNAs in seven Pelagibacter phage genomes. Melnitz was the only phage known to infect OM43 that encoded tRNAs, with four tRNAs in total (Fig. 6). A list of tRNAs used is provided in Table S2. Sequences encoding tRNAs were aligned using all-versus-all BLASTN with a minimum expect-value of 1 Â 10 25 . Two of four tRNAs in FIG 5 Phylogeny of tmRNA genes in major marine lineages. Neighbor-joining maximum-likelihood tree (100 bootstraps) of tmRNA genes found in marine phages and host lineages (not exhaustive) suggests that three the three known major lineages between Cyanobacteria, Gammaproteobacteria, and Alphaproteobacteria are shared with their associated phages, except for OM43 phage Melnitz (infecting H5P1 on the gammaproteobacterial branch), which has a tmRNA gene more closely related to genes found in Alphaproteobacteria and their phages.
Melnitz were tRNA-Arg (TCT), the first aligning most closely with its H5P1 host. The second tRNA-Arg best aligned with the tRNA gene found in the Pelagibacter phage Mosig. Its alignment to bacterial tRNA matched OM43 strains H5P1 and HTCC2181, respectively (.90% identity). The third tRNA-Leu found in Melnitz aligned with PMP-MAVG-17, previously classified as Pelagibacter phage (33) but only had 45% nucleotide identity. The fourth tRNA-Trp (CCA) found in Melnitz did not align (E values . 1 Â 10 25 ) with any tRNA found in OM43, SAR11, or any of their respective phages. Using tRNA as host indication in Melnitz therefore reflects its OM43 host but also indicates genetic exchange with SAR11 virus-host systems. Furthermore, the related Pelagibacter phage Mosig possessed tRNAs matching OM43 and Melnitz, as well as a second tRNA aligning with its SAR11 host (93%). This may suggest either a relatively recent genetic exchange between OM43 and SAR11 virus-host systems, and/or a surprisingly broad host range for Pelagibacter phage Mosig and Methylophilales phage Melnitz. Full-length genome alignments of Melnitz and HTVC008M (see Fig. S12) support this hypothesis, since rare genetic features such as the curli operon, as well as gene synteny, are conserved in both species. We speculate that a recently shared host which led to the acquisition of these genes followed by viral speciation is more likely than two separate events of horizontal gene transfer of the same genetic module.
To assess the breadth of host ranges, we challenged the SAR11 strains HTCC7211 and HTCC1062 and OM43 strains H5P1, D12P1, and C6P1 (16) with the phages Melnitz and Mosig (34), but found no evidence that either phage is able to replicate in cells beyond host class boundaries as observed via cell lysis (see Fig. S13). Melnitz only infected D12P1 and H5P1, but not C6P1 or any SAR11, whereas Mosig only caused lysis in the Pelagibacter ubique HTCC1062. Though the limited number of available bacterial strains could have missed potential permissive hosts in a different subclade, there is no evidence that noncanonical matching tRNA between SAR11/OM43 and their phages is due to a speculative broad host range in these systems, and instead supports the phylogenetic evidence we presented for a recent host transition event.
Phage host range expansion and transition within closely related (i.e., between strains) is a common feature of phage evolution and has been shown to occur in vitro via homologous recombination between coinfecting phages (92,98). In contrast, expansion and transition between distantly related taxa is rare but has been previously observed in vitro. Naturally occurring, low-abundance mutants of T4 transitioned from E. coli to Yersinia pseudotuberculosis through modification of tail fiber tips in gene 37, yielding variants that could infect both hosts and others that could only infect Y. pseudotuberculosis (93). Host-prediction of viral contigs from metagenomes has identified rare phages (115 of 3,687) possibly able to infect multiple classes (99). We propose that two properties of SAR11 and OM43 increase the likelihood of such an event in natural communities. (i) Both SAR11 hosts and their FIG 6 Alignment of tRNA genes found in Melnitz, SAR11 and OM43 lineages. Heatmaps and dendrograms (Euclidian similarity matrices) were prepared based on similarity between alignments for arginine (Arg), leucine (Leu), and tryptophan (Trp) tRNA genes found in OM43 phage Melnitz, OM43, and SAR11, as well as tRNA derived from isolated SAR11 and OM43 phages. associated phages possess extraordinarily large, globally ubiquitous effective population sizes, making even extreme rare events likely. This study has shown that phages infecting OM43 can also be abundant at higher latitudes, further increasing the likelihood of rare strain variants to occur. (ii) Both OM43 and SAR11 hosts share highly streamlined genomes with elevated levels of auxotrophy, minimum regulation and similar G1C content, shaped by selection pressure to maximize replication on minimum resources in nutrient-limited marine environments (20,100). Both fastidious hosts can be cultured on identical minimal medium, as long as additional methanol is provided as a carbon source for the methylotrophic OM43 (16). Therefore, the phenotypic difference between OM43 and SAR11 might be smaller than suggested by their taxonomic classification. Given viral selection occurs at the phenotypic level, we propose the possibility of a rare event where a mutant of a T4-like phage infecting SAR11 was able to successfully adsorb and inject its genome into an OM43 host, possibly coinfected with a methylophage that enabled a host-transition event through homologous recombination. Though homologous recombination for SAR11 and/or OM43 virus-host systems have not been shown before, similar examples exist for lambdoid E. coli phages (101) and in marine cyanophages high rates of recombination have been reported (102). Once a transition event occurred, the phage likely rapidly evolved specialism on the new host, losing the ability to infect the original host, as demonstrated in host range experiments. Compared to copiotrophic, R-strategist microbial taxa such as E. coli and Vibrio spp., very little is known about the processes governing viral host range and coevolution in Kstrategist bacteria such as the genomically streamlined taxa that dominate oligotrophic oceans. The early evidence shown here may suggest that phages can transition to new hosts from distantly related taxa in natural communities. Such events would explain in part the diverse host ranges predicted among viruses within gene-sharing clusters (31,103).
Conclusion. Here, we provide evidence that supports a putative interclass host transition event between two important clades of streamlined marine heterotrophs and expands our knowledge about the dynamics and characteristics in genomically streamlined heterotrophic virus-host systems. We isolated four nearly identical strains of the new myophage Melnitz from subtropical and temperate marine provinces infecting the important methylotrophic OM43 clade, which we showed to be closely related to myophages infecting the abundant SAR11 clade. The analysis of metagenomic data sets provides evidence that this phage group is ubiquitous in global oceans despite relatively low overall abundance, supporting the viral seed bank hypothesis. Our genomic analysis of Melnitz revealed an incomplete curli module similar to reported curli pores in pelagimyophages, representing a rare and intriguing protein dimer that is absent in their respective host clades. We propose that these virally encoded curli pores may have been repurposed as a functional analogue for the regulation of viral lysis. We also identified an ssrA gene encoding a complete viral tmRNA protein controlled by a glutamine riboswitch, showing that virus-host interactions can be regulated through riboswitches reflecting the extensive use of riboswitches in streamlined marine heterotrophs. Further phylogenetic analysis showed that the ssrA gene is related to the alphaproteobacterial SAR11 lineage, not the gammaproteobacterial OM43 lineage, providing evidence for host transition events in natural marine microbial communities, which was supported by the alignment of viral and bacterial tRNA genes of both lineages. These findings support the conclusion that in heterotrophic streamlined virus-host systems evolution of viral diversity is likely to be driven by host transition and expansion between closely related phages infecting hosts across broad taxonomic groups, likely increasing mosaicism and genetic exchange.
Water sources and phage isolation. Water samples were collected from two different stations: station L4 in the Western English Channel (WEC; 50°159N, 4°139W) and the Bermuda Atlantic Time Series sampling station (BATS; 31°509N, 64°109W) in the Sargasso Sea. For each sample, we used Niskin bottles mounted on a CTD rosette to collect 2 L of seawater at 5 m depth (Table 1) into acid-washed, sterile, polycarbonate bottles. To obtain a cell-free fraction, water samples were filtered sequentially through a 142 mm Whatman GF/D filter (2.7 mm pore size), a 142 mm, 0.2 mm-pore polycarbonate filter (Merck Millipore) and a 142 mm, 0.1 mm-pore polycarbonate filter (Merck Millipore) using a peristaltic pump. The cell-free viral fraction was concentrated to about 50 mL with tangential flow filtration (50R VivaFlow; Sartorius Lab instruments, Gottingen, Germany) and used as inoculum in a multistep viral enrichment experiment followed by dilution-to-extinction purification as described previously (16). Briefly, viral inoculum (10% [vol/vol]) was added to 96-well Teflon plates (Radleys, UK) with exponentially growing host cultures. After a 1-to 2-week incubation period, cells and cellular debris were removed with 0.1 mm syringe polyvinylidene difluoride filters. The filtrate was added as a viral inoculum to another 96-well Teflon plate with exponentially growing host culture. This process was repeated until viral infection was detected by observing cell death using flow cytometry (16). Phages were purified by dilution-toextinction methods (our detailed protocol is available here https://dx.doi.org/10.17504/protocols.io.bb73irqn).
Assessment of viral host ranges. An acid-washed (10% hydrochloric acid) 48-well Teflon plate was prepared with 2 mL per well of ASM1 amended with 1 mM methanol and 5 mL of an amino acid mix (MEM amino acids [50Â] solution). Wells were inoculated to 1 Â 10 6 cells mL 21 with Pelagibacter hosts HTCC1062 or HTCC7211 or with OM43 host C6P1, D12P1, and H5P1 cultures in replicates of three for each of the Mosig and Melnitz phages, plus one set of no-virus controls for each host. The wells marked for viral infections were infected with 200 mL of viral culture. The plate was then incubated at 15°C and monitored daily using flow cytometry for ;2 weeks. Successful infections were identified by observing cell lysis in virus amended wells compared to no-virus controls (15).
Phage DNA preparation, genome sequencing, and annotation. For each viral isolate, 50 mL OM43 host cultures were grown in 250 mL of acid-washed, polycarbonate flasks and infected at a cell density of about 1 Â 10 6 to 5 Â 10 6 cells mL 21 with 10% (vol/vol) viral inoculum. Infected cultures were incubated at 15°C for 7 to 14 days, and then the lysate was transferred to 50 mL Falcon tubes. Larger cellular debris was removed by centrifugation (GSA rotor; Thermo Scientific, catalog no. 75007588) at 8,500 rpm/10,015 Â g for 1 h. Supernatant was filtered through 0.1 mm-pore-size polyvinylidene difluoride syringe filter membranes to remove any remaining smaller cellular debris. Phage particles were precipitated using a PEG8000/NaCl flocculation approach (105). Briefly, 50 mL of lysate was amended with 5 g of PEG8000 and 3.3 g of NaCl (Sigma), shaken until dissolved, and incubated on ice overnight. . Reads were quality controlled, trimmed, and error corrected with the tadpole function (default settings) within BBMap v38.22 (106; available at https://sourceforge.net/projects/bbmap/). Contigs were assembled using SPAdes v.3.13 with default settings (107). Viral contigs were confirmed with VirSorter v1.05 (108) (categories 1 or 2, .15 kbp). The quality and completeness of the contigs, as well as the terminal repeats, were evaluated using CheckV v0.4.0 with standard settings (109). Open Reading Frames (ORFs) were identified with phanotate v2019.08.09 (110) and imported into DNA Master for manual curation (43) using additional ORF predictions made by GenMarkS2 (111), GenMark.heuristic (112), Prodigal v2.6.3 (113) and Prokka v1.14.6 (114). ORFs were functionally annotated using BLASTp against NCBI's nonredundant protein sequences (46), phmmer v2.41.1 against Pfam (47), and Swiss-Prot (48). All genes called were listed and compared using a scoring system evaluating length and overlap of ORFs, as well as the quality of annotation (43). tRNA and tmRNA were identified with tRNAScan-SE v2.0 (44) and ARAGORN v1.2.38 (45). Genomes were scanned for riboswitches using the web application of Riboswitch Scanner (115,116). FindTerm (energy score , 211) and BPROM (LDF . 2.75) from the Fgenesb_annotator pipeline (117) were used to predict promoter and terminator sequences, using default parameters. The s 70 promoters predicted this way were considered early promoters. Known T4-like late promoter sequence 59-TATAAAT-39 (52,71) and middle-promoter MotA box (TGCTTtA)-dependent middle promoters were used as a query for a BLASTN search over the whole genome. Promoters and/or terminators were excluded if they were not intergenic or not within a 10-bp overlap of the start/end of ORFs.
Host DNA preparation, genome sequencing, and annotation. Host cultures were grown in 50 mL of ASM1 medium amended with 1 mM methanol in 250 mL polycarbonate flasks. Upon reaching maximum cell density, genomic DNA was extracted using a Qiagen DNeasy PowerWater kit (14900-50-NF) from biomass retained on 0.1 mm pore-size PC filters according to the manufacturer's protocol, with minor modifications to increase the yield. The bead-beating step was lengthened from 5 to 10 min. DNA elution was performed with a 2 min incubation with elution buffer warmed to 55°C. Nextera DNA libraries were prepared and sequenced by MicrobesNG (Birmingham, UK) for Illumina short-read sequencing on the HiSeq2500 (Illumina paired end [2 Â 250 bp], targeting 30-fold coverage). In addition, long-read sequencing was prepared using a MinION flow cell. Reads were quality controlled, trimmed, and error corrected with the tadpole function (default settings) within BBMap v38.22 (106; available at https:// sourceforge.net/projects/bbmap/). Contigs were assembled using SPAdes v.3.13 with default settings (107). Gene calls were made with PROKKA v1.14.6 (114) and submitted to BlastKOALA (118) for further annotation and prediction of KEGG pathways using the Methylophilales strain HTCC2181 as a reference.
Structural analysis of a putative endolysin. The predicted amino acid sequence of gene product 67 (gp67) was used as a query to identify putative structures on the Swiss-Model server (81) using BLAST and HHBits. Putative models were downselected based on suitable quaternary structure properties and GQME . 0.7. Autolysin SagA from Brucella abortus (PDB model 7NDP.1) was selected as the best hit and used for subsequent modeling of the structure of gp67. Models and associated figures were visualized in PyMOL v.2.5.1 (https://pymol.org/2/).
Structural analysis of CsgGF. The predicted amino acid sequences of CsgG and CsgF from Melnitz were used as a query to identify putative structures on the Swiss-Model server using BLAST and HHBits. The best hit was determined by predicted quaternary structure properties. Global GQME scores were ,0.5, and Q-MEAN scores identified the best-predicted structures as unreliable (range, 23.28 to 25.11), although localized regions had Q-MEAN scores of .0.7. Therefore, to improve structural predictions, amino acid sequences for CsgG and CsgF were independently run through AlphaFold2 using the available Colab web interface (https://colab .research.google.com/drive/1LVPSOf4L502F21RWBmYJJYYLDlOU2NTL). Structures were determined both with or without postprediction relaxation (use_amber) and use of MMSeqs2 templates (use_templates). Since no noticeable differences were observed between these runs, we selected the top scoring unrelaxed model for downstream comparison to known CsgG and CsgF structures. Predicted structures from AlphaFold2 were downloaded, visualized and aligned to E. coli CsgGF (7NDP) in PyMOL v.2.5.1. CsgGF from pelagimyophage HTVC008M was similarly analyzed with AlphaFold2 to confirm structural similarity to that of Melnitz. Structural prediction of the CsgGF heterodimer in Melnitz was assumed to conform to the 18-mer structure of CsgGF in E. coli. Scripts for generating structures in PyMOL are available (https://github.com/HBuchholz/Genomic -evidence-for-inter-class-host-transition-between-streamlined-heterotrophs). Electrostatic potential of protein surfaces was calculated and visualized using the APBS Electrostatics plugin available within PyMOL.
Transmission electron microscopy. For ultrastructural analysis, bacterial cells and/or phages were adhered onto pioloform-coated 100 mesh copper EM grids (Agar Scientific, Standsted, UK) by floating grids on sample droplets placed on parafilm for 3 min. After 3 Â 5 min washes in droplets of deionized water, structures were contrasted on droplets of 2% (wt/vol) uranyl acetate in 2% (wt/vol) methyl cellulose (ratio, 1:9) on ice for 10 min, the grids were picked up in a wire loop, and excess contrasting medium was removed using filter paper. The grids were then air dried, removed from the wire loop, and imaged using a JEOL JEM 1400 transmission electron microscope operated at 120 kV with a digital camera (Gatan, ES1000W, Abingdon, UK).
Data availability.